GenDocSum + MCLR: Generic document summarization based on maximum coverage and less redundancy
نویسندگان
چکیده
0957-4174/$ see front matter 2012 Elsevier Ltd. A http://dx.doi.org/10.1016/j.eswa.2012.04.067 ⇑ Corresponding author. E-mail addresses: [email protected], a.ramiz@ With the rapid growth of information on the Internet and electronic government recently, automatic multi-document summarization has become an important task. Multi-document summarization is an optimization problem requiring simultaneous optimization of more than one objective function. In this study, when building summaries from multiple documents, we attempt to balance two objectives, content coverage and redundancy. Our goal is to investigate three fundamental aspects of the problem, i.e. designing an optimization model, solving the optimization problem and finding the solution to the best summary. We model multi-document summarization as a Quadratic Boolean Programing (QBP) problem where the objective function is a weighted combination of the content coverage and redundancy objectives. The objective function measures the possible summaries based on the identified salient sentences and overlap information between selected sentences. An innovative aspect of our model lies in its ability to remove redundancy while selecting representative sentences. The QBP problem has been solved by using a binary differential evolution algorithm. Evaluation of the model has been performed on the DUC2002, DUC2004 and DUC2006 data sets. We have evaluated our model automatically using ROUGE toolkit and reported the significance of our results through 95% confidence intervals. The experimental results show that the optimization-based approach for document summarization is truly a promising research direction. 2012 Elsevier Ltd. All rights reserved.
منابع مشابه
Multi-Document Summarization Model Based on Integer Linear Programming
This paper proposes an extractive generic text summarization model that generates summaries by selecting sentences according to their scores. Sentence scores are calculated using their extensive coverage of the main content of the text, and summaries are created by extracting the highest scored sentences from the original document. The model formalized as a multiobjective integer programming pr...
متن کاملTowards Abstractive Multi-Document Summarization Using Submodular Function-Based Framework, Sentence Compression and Merging
We propose a submodular function-based summarization system which integrates three important measures namely importance, coverage, and non-redundancy to detect the important sentences for the summary. We design monotone and submodular functions which allow us to apply an efficient and scalable greedy algorithm to obtain informative and well-covered summaries. In addition, we integrate two abstr...
متن کاملText Summarization Model based on Redundancy-Constrained Knapsack Problem
In this paper we propose a novel text summarization model, the redundancy-constrained knapsack model. We add to the Knapsack problem a constraint to curb redundancy in the summary. We also propose a fast decoding method based on the Lagrange heuristic. Experiments based on ROUGE evaluations show that our proposals outperform a state-of-the-art text summarization model, the maximum coverage mode...
متن کاملMulti-Document Arabic Summarization Using Text Clustering to Reduce Redundancy
“The process of multi-document summarization is producing a single summary of a collection of related documents. In this work we focus on generic extractive Arabic multi-document summarizers. We also describe the cluster approach for multi-document summarization. The problem with multi-document text summarization is redundancy of sentences, and thus, redundancy must be eliminated to ensure cohe...
متن کاملSummary Evaluation with and without References
We study a new content-based method for the evaluation of text summarization systems without human models which is used to produce system rankings. The research is carried out using a new content-based evaluation framework called FRESA to compute a variety of divergences among probability distributions. We apply our comparison framework to various well-established content-based evaluation measu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Expert Syst. Appl.
دوره 39 شماره
صفحات -
تاریخ انتشار 2012